Back

Medical Physics

Wiley

Preprints posted in the last 30 days, ranked by how well they match Medical Physics's content profile, based on 14 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Evaluating the Large Language Model-Based Quality Assurance Tool for Auto-Contouring

Tozuka, R.; Akita, T.; Matsuda, M.; Tanno, H.; Saito, M.; Nemoto, H.; Mitsuda, K.; Kadoya, N.; Jingu, K.; Onishi, H.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349802 medRxiv
Top 0.1%
19.0%
Show abstract

Purpose: Manual verification of AI-based auto-contouring is labor-intensive and prone to fatigue-related errors. This study developed the large language model (LLM)-based automated Quality Assurance (QA) for auto-contouring (LAQUA) system using a multimodal LLM, Gemini 2.5 Pro, and evaluated its feasibility as a clinical primary screening tool to streamline the QA workflow. Methods: Twenty male pelvic CT scans from an open dataset were utilized. Three distinct auto-contouring software packages (OncoStudio, RatoGuide prototype and syngo.via) were evaluated. Auto-contouring results for each slice were exported as PDF images with overlaid contours and input into Gemini 2.5 Pro. The LLM was instructed to rate the contour quality on a 5-point clinical scale (5: Optimal; 4: Acceptable; 3: Suboptimal; 2: Unacceptable; redraw from scratch; 1: Unacceptable; organ not detected). Using evaluations by two board-certified radiation oncologists as ground truth, Spearman's rank correlation coefficients ({rho}) and weighted kappa coefficients ({kappa}) were calculated. Additionally, to assess screening performance, sensitivity and specificity were calculated by dichotomizing the scores into "Pass" and "Fail" using two different cutoffs (scores [≥] 3 and [≥] 4 as "Pass"). Finally, the alignment of the rationales provided by the LLM with the auto-contouring quality was evaluated by two board-certified radiation oncologists. This was conducted using a Likert scale assessing four domains (error detection, hallucination, clinical relevance, and anatomical understanding), each scored out of 2 points. Results: The LAQUA system demonstrated moderate to strong agreement with expert judgments across all evaluated organs ({rho}: 0.567 - 0.835; quadratic weighted {kappa} : 0.639 - 0.804), with the rectum showing the highest correlation. Regarding screening performance, a cutoff of [≥]3 as "Pass" achieved the highest sensitivity and specificity in specific subgroups, but with wide 95% confidence intervals (CIs). A cutoff of [≥]4 as "Pass" narrowed the CIs, yielding the highest sensitivity in the rectum (0.976) and the highest specificity in the left femoral head (0.933). Qualitatively, the LLM's rationales achieved an overall mean score of 1.70 {+/-} 0.48 (out of 2), with 155 of 291 outputs receiving perfect scores across all criteria. Conclusions: The LAQUA system demonstrated substantial agreement with expert evaluations in AI-based auto-contouring quality assessment. While potential overestimation bias (risk of missing "Fail" cases) warrants caution, the observed sensitivity suggests its feasibility as a primary screening QA tool to efficiently filter acceptable contours, thereby reducing the clinical workload.

2
Visual Fidelity-Driven Quality Assessment of Medical Image Translation

Bizjak, Z.; Zagar, J.; Spiclin, Z.

2026-03-20 radiology and imaging 10.64898/2026.03.18.26348721 medRxiv
Top 0.1%
18.9%
Show abstract

Automated and reliable image quality assessment (IQA) is essential for safe use of medical image synthesis in critical applications like adaptive radiotherapy, treatment planning, or missing-modality reconstruction, where unnoticed generative artifacts may adversely affect outcomes. We evaluated image-to-image translation quality by coupling large-scale expert visual quality assessment with explainable automated IQA modeling. Adversarial diffusion-based framework, SynDiff, was applied to four cross-modality synthesis tasks, including three inter-MR and a CBCT-to-CT translation. Using four-fold cross-validation, ten reference-based and eight no-reference IQA metrics were computed for all synthesized images. Visual IQA ratings were independently collected from thirteen expert raters using predetermined protocol and specialized image viewer enabling blinded, randomized six-point Likert scoring. Auto-Sklearn was employed to learn ensemble regression models mapping IQA metrics to visual consensus ratings, with separate models trained on reference-based and no-reference metrics. The models closely reproduced distribution and ordering of expert ratings, typically within +/- 0.5 Likert points. Reference-based models achieved higher agreement with visual ratings than no-reference models (R^2 0.75 vs. 0.59, resp.), although the latter remained unbiased and informative. Explainability analyses highlighted structure- and contrast-sensitive metrics as key predictors. Overall, the results demonstrate that ensemble regression models can provide transparent, scalable, and clinically meaningful quality control for generative medical imaging.

3
Improving Glioblastoma Classification Using Quantitative Transport Mapping with a Synthetic Data Trained Deep Neural Network

Romano, D. J.; Roberts, A. G.; Weppner, B.; Zhang, Q.; John, M.; Hu, R.; Sisman, M.; Kovanlikaya, I.; Chiang, G. C.; Spincemaille, P.; Wang, Y.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349864 medRxiv
Top 0.1%
6.7%
Show abstract

Purpose: To develop a deep neural network-based, AIF-free, perfusion estimation method (QTMnet) for improved performance on glioma classification. Methods: A globally defined arterial input function (AIF) is needed to recover perfusion parameters in the two-compartment exchange model (2CXM). We have developed Quantitative Transport Mapping (QTM) to create an AIF-independent estimation method. QTM estimation can be formulated using deep neural networks trained on synthetic DCE-MRI data (QTMnet). Here, we provide a fluid mechanics-based DCE-MRI simulation with exchange between the capillaries and extravascular extracellular space. We implemented tumor ROI generation to morphologically characterize tissue perfusion. We compared our QTMnet implementation with 2CXM on 30 glioma human subjects, 15 of which had low-grade gliomas, and 15 with high-grade glioblastomas. Results: QTMnet outperforms (best AUC: 0.973) traditional 2CXM (best AUC: 0.911) in a glioma grading task. Conclusion: The AIF-independent QTMnet estimation provides a quantitative delineation between low-grade and high-grade gliomas.

4
Assessment of patient radiation dose in conventional lumbar spine radiography: A multicenter study in the Souss Massa region, Morocco

SOUDI, A.; MENHOUR, Y.

2026-03-26 radiology and imaging 10.64898/2026.03.24.26349174 medRxiv
Top 0.1%
6.6%
Show abstract

BackgroundPatient radiation exposure in diagnostic radiology is an important concern for radiation protection and patient safety. Monitoring radiation dose levels during radiographic examinations is essential to ensure compliance with diagnostic reference levels (DRLs) and to optimize radiological practices. ObjectiveThe aim of this study was to evaluate patient radiation dose during conventional lumbar spine radiography and compare the obtained values with diagnostic reference levels. MethodsA descriptive cross-sectional multicenter study was conducted in four hospitals in the Sous Massa region, Morocco, between April and June 2017. Data were collected from 142 patients undergoing lumbar spine radiography examinations and from 20 radiology technicians. Exposure parameters including tube voltage, tube current, exposure time, focus-to-film distance, and field size were recorded. Entrance surface dose (ESD) was estimated using MICADO software, and dose area product (DAP) values were subsequently calculated. The 75th percentile values were determined and compared with diagnostic reference levels. ResultsThe regional 75th percentile ESD values were 5.33 mGy for the anteroposterior projection and 7.38 mGy for the lateral projection. Corresponding DAP values were 1840.9 mGy.cm2 and 2783.65 mGy.cm2, respectively. All obtained values were below the diagnostic reference levels used for comparison. However, variations between hospitals were observed, likely due to differences in imaging protocols and equipment. ConclusionRadiation doses associated with lumbar spine radiography in the evaluated hospitals were within acceptable limits according to diagnostic reference levels. Continuous monitoring of patient radiation exposure and optimization of radiographic techniques remain essential to ensure effective radiation protection.

5
A Deployable Explainable Deep Learning System for Tuberculosis Detection from Chest X-Rays in Resource-Constrained High-Burden Settings

Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349662 medRxiv
Top 0.1%
4.9%
Show abstract

Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.

6
AI-Assisted Pneumonia Detection, Localisation and Report Generation from Chest X-rays

Boiardi, F. E.; Lain, A. D.; Posma, J. M.

2026-03-23 radiology and imaging 10.64898/2026.03.20.26348879 medRxiv
Top 0.1%
4.9%
Show abstract

Pneumonia detection in chest X-rays (CXRs) is complicated by high inter-observer variability and overlapping radiographic patterns. While deep learning (DL) solutions show promise, limitations in generalisability and explainability hinder clinical adoption. We address these challenges by introducing a holistic DL-based computer-aided diagnosis (CAD) pipeline for pneumonia detection, localisation, and structured report generation from CXRs. We curated the largest composite of publicly available CXRs to date (N=922,634), of which [Formula] were used for training. MIMIC-CXR radiology reports were relabelled using a local large language model (LLM), positing that LLM-derived pneumonia labels would yield higher diagnostic sensitivity than the provided rule-based natural language processing (rNLP) labels. DenseNet-121 classifiers were trained on four configurations: MIMIC-CXR (rNLP), MIMIC-CXR (LLM), and each supplemented with VinDr-CXR data. Gradient-weighted Class Activation Mapping (Grad-CAM) provided visual explainability and lung zone-based localisation. LLM-driven relabelling significantly improved human-label agreement (96.5% vs 72.5%, P=1.66x10-11). The best-performing model (MIMIC-CXR (LLM) + VinDr-CXR) achieved 82.08% sensitivity and 81.97% precision, surpassing both radiologist sensitivity ranges (64-77.7%) and CheXNets pneumonia F1-score (43.5%). Grad-CAM localisation attained a moderate F1-score of 52.9% (sensitivity=65.7%, precision=44.3%), confirming focus alignment with pathological lung regions while highlighting areas for refinement. These findings demonstrate that LLM-driven label curation, combined with DL, can exceed conventional rNLP and radiologist performance, advancing high-quality data integration in predictive medical imaging. Clinically, our pipeline offers rapid triage, automated report drafting, and real-time pneumonia surveillance; tools that can streamline radiology workflows and mitigate diagnostic errors.

7
Virtual Spectral Decomposition with Dendritic Binary Gating Detects Pancreatic Cancer Tissue Transformation on Standard CT: Multi-Institutional Validation Across Three Independent Datasets with a 3.8-Year Pre-Diagnostic Detection Window

Chandra, S.

2026-04-12 oncology 10.64898/2026.04.08.26350418 medRxiv
Top 0.2%
3.7%
Show abstract

Background. Pancreatic ductal adenocarcinoma (PDAC) has a five-year survival rate of approximately 12%, largely because it is typically diagnosed at an advanced stage. CT-based computational methods for early detection exist but rely on black-box deep learning or large texture feature sets without tissue-specific interpretability. Methods. We developed Virtual Spectral Decomposition (VSD), which applies six parameterized sigmoid functions S(HU) = 1/(1+exp(-alpha x (HU - mu))) to standard portal-venous CT, decomposing each pixel into tissue-specific response channels for fat (mu=-60), fluid (mu=10), parenchyma (mu=45), stroma (mu=75), vascular (mu=130), and calcification (mu=250). Dendritic Binary Gating identifies structural content per channel using morphological filtering, enabling co-firing analysis and lone firer identification. A 25-feature signature was extracted per patient. Three independent datasets were analyzed: NIH Pancreas-CT (n=78 healthy), Medical Segmentation Decathlon Task07 (n=281 PDAC, paired tumor/adjacent tissue), and CPTAC-PDA from The Cancer Imaging Archive (n=82, multi-institutional, with DICOM time point tags). The same six sigmoid parameters were used across all datasets without retraining. Results. VSD achieved AUC 0.943 for field effect detection (healthy vs cancer-adjacent parenchyma) and AUC 0.931 for patient-stratified tumor specification on MSD. On CPTAC-PDA, VSD achieved AUC 0.961 (6 features) and 0.979 (25 features) for distinguishing healthy from cancer-bearing pancreas on scans obtained prior to pathological diagnosis. All significant features replicated across datasets in the same direction: z_fat (d=-2.10, p=3.5e-27), z_fluid (d=-2.76, p=2.4e-38), fire_fat (d=+2.18, p=1.2e-28). Critically, VSD severity did not correlate with days-from-diagnosis (r=-0.008, p=0.944) across a range of day -1394 to day +249. Patient C3N-01375, scanned 3.8 years before pathological diagnosis, had VSD severity 1.87, well above the healthy mean of 0.94 +/- 0.33. The tissue transformation signature was temporally stable, indicating an early, persistent tissue state rather than a progressively worsening process. Conclusions. VSD with Dendritic Binary Gating detects a stable pancreatic tissue composition signature on standard CT that is present years before clinical diagnosis, validated across three independent datasets without parameter adjustment. The six sigmoid channels map to biologically meaningful tissue components through a fully transparent interpretability chain. The temporal stability of the signal implies a detection window of 3-7 years, consistent with known PanIN-3 microenvironment transformation timelines. VSD functions as a single-scan screening tool applicable to any abdominal CT performed during the pre-clinical window.

8
A Systematic Performance Evaluation of Three Large Language Models in Answering Questions on moderate Hyperthermia

Dennstaedt, F.; Cihoric, N.; Bachmann, N.; Filchenko, I.; Berclaz, L.; Crezee, H.; Curto, S.; Ghadjar, P.; Huebenthal, B.; Hurwitz, M. D.; Kok, P.; Lindner, L. H.; Marder, D.; Molitoris, J.; Notter, M.; Rahman, S.; Riesterer, O.; Spalek, M.; Trefna, H.; Zilli, T.; Rodrigues, D.; Fuerstner, M.; Stutz, E.

2026-03-26 oncology 10.64898/2026.03.25.26349254 medRxiv
Top 0.2%
3.7%
Show abstract

BackgroundLarge Language Models (LLMs) have demonstrated expert-level performance across many medical domains, suggesting potential utility in clinical practice. However, their reliability in the highly specialized domain of moderate hyperthermia (HT) remains unknown. We therefore evaluated the performance of three modern LLMs in answering HT-related questions. MethodsWe conducted an evaluation study by posing 40 open-ended questions--22 clinical and 18 physics-related--to three modern LLMs (DeepSeek-V3, Llama-3.3-70B-Instruct, and GPT-4o). Responses were blinded, randomized, and evaluated by 19 international experts with either a clinical or physics background for quality (5-point Likert scale: 1=very bad, 2=bad, 3=acceptable, 4=good to 5=very good) and for potential harmfulness in clinical decision-making. ResultsA total of 1144 quality evaluation responses were collected. Overall reported mean quality scores were similar across models, with DeepSeek scoring 3.26, Llama 3.18, and GPT-4o 3.07, corresponding to an "acceptable" rating. Across expert evaluations, responses were considered potentially harmful in 17.8% of cases for DeepSeek, 19.3% for Llama, and 15.3% for GPT-4o. Notably, despite "acceptable" mean scores, approximately 25% of responses were rated "bad" to "very bad," and potentially harmful answers occurred in [~]15-19% of evaluations, indicating a non-trivial risk if used without domain expertise. ConclusionOur findings indicate that the performance of LLMs in HT in versions available at the time of investigation is only partially satisfactory. The proportion of poor-quality responses is too high and may lead non-domain experts to misinterpret the available clinical evidence and draw inappropriate clinical conclusions.

9
The false positive paradox: Examining real-world clinical predictive performance of FDA-authorized AI devices for radiology using clinical prevalence

Sparnon, E.; Stevens, K.; Song, E.; Harris, R. J.; Strong, B. W.; Bruno, M. A.; Baird, G. L.

2026-03-27 radiology and imaging 10.64898/2026.03.25.26349197 medRxiv
Top 0.2%
3.1%
Show abstract

The present study evaluates the real-world clinical predictive performance of FDA-authorized artificial intelligence (AI) devices used in radiology, focusing on the false positive paradox (FPP) and its implications for clinical practice. To do this, we analyzed publicly available FDA data on AI radiology devices from 2024 and 2025 from 510(k) summaries, demonstrating how diagnostic accuracy metrics like sensitivity and specificity do not necessarily translate into high positive predictive value (PPV) due to the influence of target disease prevalence. We show the importance of disclosing the false discovery (FDR) and false omission rates (FOR) and argue that this transparency enables clinicians to select AI systems that balance false positive and false negative costs in a clinically, ethically, and financially appropriate manner. Finally, we provide recommendations for what data should be provided to best serve practices and radiologists.

10
Development of a low-dose PBMC humanized mouse model using CD47;Rag2;IL2rγ triple KO mice: Enhanced leukocyte reconstitution and extended experimental window

Heo, S.-H.; Kim, K.-H.; Song, H.-Y.; Lee, S.-w.; Baek, I.-J.; Ryu, J.-W.; Ryu, S.-H.; Seo, S.-M.; Jo, S.-J.

2026-03-30 cancer biology 10.64898/2026.03.25.714298 medRxiv
Top 0.3%
2.4%
Show abstract

Humanized mice (hu-mice), which recapitulate the human immune system, have become increasingly important for preclinical immunotherapy studies. Among these models, the human peripheral blood mononuclear cells (PBMC)-engrafted hu-mice model is the simplest and fastest. However, its utility is hindered by the development of lethal graft-versus-host disease (GvHD) and the insufficient reconstitution of human leukocytes. To address these limitations, we developed PBMC hu-mice models using a novel strain, NOD-CD47nullRag2nullIL-2r{gamma}null (RTKO) focusing on the immunological defects of the NOD strain and the immunotolerance provided by CD47 deficiency. Six-week-old female NOD-Rag2nullIL-2r{gamma}null (RID) and RTKO mice were intravenously injected with three different PBMC doses (3x106, 5x106, and 1x107 cells). At standard doses (5x106 and 1x107 cells), RTKO mice exhibited enhanced engraftment of human leukocytes, though GvHD was more severe compared to the RID strain, resulting in a limited experimental window. However, in a subsequent trial using a lower dose of PBMCs (3 x 106 cells), RTKO mice demonstrated notable advantages, including stable reconstitution of human leukocytes, milder GvHD symptoms without life-threatening lesions, and a markedly prolonged experimental window. Considering the difficulties in generating hematopoietic stem cell (HSC)-engrafted hu-mice, the extended experimental window provided by this model, which is comparable to HSC hu-mice, is a significant improvement. Moreover, the radiation tolerance conferred by the Rag gene mutation in this model offers another advantage for radiotherapy research. Consequently, the low-dose PBMC RTKO model serves as a versatile and valuable platform for a broad spectrum of immunotherapy studies, especially in the field of immuno-oncology.

11
CorSeg-CineSAX: An Open-Source Deep Learning Framework for Fully Automatic Segmentation of Short-Axis Cine Cardiac MRI Across Multiple Cardiac Diseases

Xu, R.; Jiang, S.; Zhai, Y.; Chen, Y.

2026-04-03 cardiovascular medicine 10.64898/2026.04.01.26349955 medRxiv
Top 0.3%
2.1%
Show abstract

Background: Segmentation of the left ventricular myocardium, left ventricular cavity, and right ventricular cavity on short-axis cine cardiac magnetic resonance (CMR) images is essential for quantifying cardiac structure and function. However, existing automated segmentation tools are limited by small training datasets, narrow disease coverage, restrictive input format requirements, and the absence of anatomical plausibility constraints, hindering their clinical adoption. Methods: We constructed the largest annotated CMR short-axis segmentation dataset to date, comprising 1,555 subjects from 12 centers with five cardiac disease types and full cardiac cycle annotations totaling 319,175 labeled images. A MedNeXt-L model was trained using a 2D slice-by-slice strategy with full field-of-view input, eliminating dependencies on 3D volumes, temporal sequences, or region-of-interest(ROI) localization. A deterministic three-step post-processing pipeline was designed to enforce anatomical priors: connected component constraint, containment relationship constraint, and gap-filling constraint. The model was validated on an internal test set (310 subjects) and three independent public external datasets (ACDC, M&Ms1, M and Ms2; 855 subjects from 6 additional centers across 3 countries), spanning 15 cardiac disease categories-10 of which were never encountered during training. Results: The model achieved mean Dice similarity coefficients (DSC) of 0.913 {+/-} 0.037 and 0.911 {+/-} 0.040 on internal and external test sets, respectively, with a cross-domain performance gap of only 0.002. Post-processing eliminated all containment violations (7.5% [->] 0%) and gap errors (1.8% [->] 0%) while reducing fragment rates by 85.5% (9.0% [->] 1.3%). Zero-shot generalization to 10 unseen disease categories yielded DSC values ranging from 0.899 to 0.921. Automated clinical functional parameters demonstrated excellent agreement with manual measurements for left ventricular indices and right ventricular volumes (intraclass correlation coefficients [≥] 0.977). Conclusions: CorSeg-CineSAX provides a robust, open-source framework for fully automatic CMR short-axis segmentation across diverse clinical scenarios. All source code and pre-trained weights are publicly available at https://github.com/RunhaoXu2003/CorSeg.

12
Multi-task deep learning integrating pretreatment MRI and whole slide images predicts induction chemotherapy response and survival in locally advanced nasopharyngeal carcinoma

Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.

2026-04-11 radiology and imaging 10.64898/2026.04.07.26350350 medRxiv
Top 0.3%
1.9%
Show abstract

Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.

13
Usages and perceptions of artificial intelligence among French radiologists

Jean, A.; Benillouche, P.; Jacques, T.

2026-03-26 radiology and imaging 10.64898/2026.03.23.26348621 medRxiv
Top 0.3%
1.8%
Show abstract

This study analyzes the adoption, barriers, and expectations of French radiologists regarding the use of Artificial Intelligence (AI) solutions in their daily practice. Despite a recognition of AI's potential to make radiology more precise, predictive, and personalized, its adoption remains limited. The main obstacles identified are the high cost of those solutions and the insufficient equipment of French imaging centers with AI technologies. Nevertheless, the survey reveals a strong willingness to adopt, with over 70% of radiologists expressing their desire to use AI and 0% declaring a refusal to use it. Furthermore, the radiologists' fears of being replaced by AI are very low (0 to 8.8%).

14
Development and Validation of a Multimodal AI-Based Model for Predicting Post-Prostatectomy Treatment Outcomes from Baseline Biparametric Prostate MRI

Simon, B. D.; Akcicek, E.; Harmon, S. A.; Clifton, L. D.; Thakur, A.; Gurram, S.; Clifton, D.; Wood, B. J.; Karaosmanoglu, A. D.; Choyke, P. L.; Akata, D.; Pinto, P. A.; Turkbey, B.

2026-03-22 urology 10.64898/2026.03.19.26348716 medRxiv
Top 0.3%
1.8%
Show abstract

Prostate cancer (PCa) is the second most common cancer and cause of cancer death in American men. Existing risk prediction methods have limited accuracy and reproducibility, resulting in difficulty in predicting disease severity. We demonstrate the development and external validation of an automated multimodal artificial intelligence algorithm using biparametric MRI (bpMRI) and clinical covariates for predicting biochemical recurrence (BCR) after radical prostatectomy (RP) in PCa patients. Development cohort included 80% of patients from center 1 (n = 240) who underwent prostate MRI prior to RP between January 2008 and December 2018 with a minimum of two years of follow-up after RP. Test cohort included the remaining 20% of center 1 patients (n = 71), and the external validation cohort from center 2 (n = 168). Center 2 patients included those who underwent prostate MRI and RP between January 2015 and December 2024 with a minimum of two years of follow-up. Clinical comparisons were CAPRA-S (center 1) and ISUP grade group from post-RP biopsy (center 2). Models developed were a clinical model (M0), an automated clinical model (M1), a radiomics model (M2), and a multimodal model (M3). Clinical variables (M0) included PSA, age, primary Gleason, and ISUP grade group. Automated clinical variables (M1 and M3) included PSA and age. Radiomics features (M2 and M3) were extracted from bpMRI using a lesion detection algorithm. Accuracy, sensitivity, specificity, and AUC were calculated, and log-rank tests compared BCR-free survival to assess the models ability to discriminate relative to clinical standards. Intermediate-risk groups were also assessed. The multimodal model (M3) had the highest AUC across test sets (combined: 0.71; center 1: 0.70; center 2: 0.75) and was the only model to significantly differentiate BCR-free survival outcomes in intermediate-risk groups across both centers (p < 0.05). This automated multimodal model leveraging radiomics and clinical covariates can predict BCR after RP, approaches clinical gold standards, and may enhance imaging-based prognostication following further validation.

15
Junctional Hounsfield unit ratio: understanding patient-specific vertebral bone strength for proximal junctional kyphosis risk assessment in adult spinal deformity surgery

Nagatani, Y.; Segi, N.; Ito, S.; Ouchida, J.; Yamauchi, I.; Ode, Y.; Okada, Y.; Takeichi, Y.; Tachi, H.; Kagami, Y.; Morishita, K.; Oishi, R.; Miyairi, Y.; Morita, Y.; Ohshima, K.; Oyama, H.; Ogura, K.; Shinjo, R.; Ohara, T.; Tsuji, T.; Kanemura, T.; Imagama, S.; Nakashima, H.

2026-04-06 orthopedics 10.64898/2026.04.05.26349586 medRxiv
Top 0.3%
1.7%
Show abstract

Study design A retrospective case control study Objective To predict proximal junctional kyphosis (PJK) risk by normalizing individual vertebral bone strength using the ratio of vertebral Hounsfield unit (HU) values around the upper instrumented vertebrae (UIV). Summary of background data PJK poses a significant challenge in treating patients after adult spinal deformity (ASD) surgery. While the vertebral body HU value is associated with PJK risk, the optimal threshold remains unclear, and a relative assessment of HU values within individuals has not been conducted. Methods Data on patients who underwent corrective fusion of the middle to lower thoracic region of the pelvis for ASD were assessed. The 126 patients were categorized into PJK and non-PJK groups. We compared the patients' backgrounds, vertebral body HU, and junctional HU ratio, defined as the HU value of UIV+1 divided by the HU value of UIV (HUUIV+1/HUUIV). The UIV+2/UIV+1 HU ratio was calculated similarly. Results The PJK and non-PJK groups included 30 and 96 patients, respectively. After propensity score matching, 28 patients from each group were analyzed. HU values at UIV+2 and UIV+1 (117.0 {+/-} 46.6 vs 145.1 {+/-} 45.9, p=0.018, and 105.5 {+/-} 36.2 vs 147.3 {+/-} 44.9, p<0.001, respectively) were lower in the PJK group. Junctional HU ratio was significantly lower in the PJK group (0.88 {+/-} 0.18 vs 1.13 {+/-} 0.25, p<0.001), and receiver operating characteristic analysis showed that the junctional HU ratio had the highest discriminative ability (area under the curve 0.812). At the optimal cutoff value (HU ratio of 0.905), the sensitivity and specificity for PJK were 64.3% and 89.3%, respectively. Conclusions A low junctional HU ratio was strongly associated with PJK after ASD surgery. This parameter reflects the bone strength mismatch at the proximal junction and may help improve preoperative risk assessment and UIV selection.

16
Comparable daughter radionuclide redistribution with superior tumor absorbed dose of the SSTR2 antagonist Ac-DOTA-TATE

Desai, P.; Huber, M.; Mewis, D.; Chouin, N.; Sturzbecher-Hoehne, M.; Gericke, G.; Jaekel, A.

2026-03-18 cancer biology 10.64898/2026.03.16.711095 medRxiv
Top 0.4%
1.7%
Show abstract

It has been hypothesized that effective cellular internalization is required for the retention of 225Ac daughter radionuclides. The complex decay chain of 225Ac and recoil-mediated release of daughters, particularly 213Bi (half-life (t1/2) = 46 min), raise concerns about redistribution that may reduce tumor absorbed dose (TAD) and increase off-target radiation exposure. Because somatostatin receptor subtype 2 (SSTR2) antagonists such as SSO110 are not internalized, it has been proposed that the daughter radionuclides are less effectively retained compared to internalizing agonists such as DOTA-TATE. We therefore performed a direct and quantitative comparison of daughter radionuclide redistribution following administration of [225Ac]Ac-SSO110 and [225Ac]Ac-DOTA-TATE. MethodsBiodistribution and 213Bi redistribution were evaluated in Balb/c nude mice bearing NCI-H69 small cell lung cancer xenografts. Repeated gamma counting combined with bi-exponential modeling was used to quantify 225Ac and 213Bi activity in tumor, blood, bone marrow, kidneys, liver, and intestines up to 96 h post-injection. TAD was calculated with and without accounting for experimentally-derived 213Bi redistribution. Real-time in vitro binding assays were conducted to characterize cellular retention of [225Ac]Ac-SSO110. Results[225Ac]Ac-SSO110 demonstrated higher tumor uptake and prolonged retention compared with [225Ac]Ac-DOTA-TATE, resulting in a 1.9-fold higher tumor-to-kidney ratio at 96 h and a 2.8-fold higher TAD. Redistribution of 213Bi from tumor was minimal and comparable between agonist and antagonist, with maximum tumor loss of 3.5% for [225Ac]Ac-SSO110 and 2% for [225Ac]Ac-DOTA-TATE. Accounting for daughter redistribution reduced TAD by less than 5% for both radioconjugates. No sustained 213Bi accumulation was observed in blood, kidneys, or liver, and only minimal activity was detected in bone marrow and intestines. Real-time binding studies demonstrated sustained cell-associated {beta}- signal following incubation with [225Ac]Ac-SSO110. ConclusionReceptor-mediated internalization is not required for effective retention of 225Ac daughter radionuclides. Despite negligible internalization, [225Ac]Ac-SSO110 achieved superior TAD and higher tumor-to-kidney ratio without increased daughter redistribution compared with the internalizing agonist [225Ac]Ac-DOTA-TATE. These findings question the necessity of internalization for daughter retention and support further evaluation of antagonist-based 225Ac radioligand therapy.

17
A Computational Framework for Pulmonary Assessing Wave Intensity Following Simulated Lung Resection

Mackenzie, J. A.; Hill, N. A.

2026-03-18 biophysics 10.64898/2026.03.16.712097 medRxiv
Top 0.4%
1.7%
Show abstract

Background and ObjectivesLung cancer is one of the most frequently diagnosed cancers worldwide. While non-surgical treatment options have increased in number and efficacy, lung resection for primary cancers is still a mainstay of treatment. Lung resection has been shown to impair right ventricular function, although the mechanism for the impairment remains unclear. Wave intensity is increasingly used as a metric for increased post-operative afterload. Here, we develop a computational framework to assess the impact of simulated lung resection on wave intensity to establish that post-operative changes in wave intensity are attributable to the change in pulmonary artery morphometry. MethodsWe analyse a 48 pulmonary arterial surfaces segmented from CT images in patients with no evidence of lung disease to obtain 1D representations of the pulmonary vasculature. For each pulmonary vasculature we sequentially remove vessel branches to mimic post-operative morphometric changes to the arterial network. Using an established 1D computational flow model, we simulate pulsate blood flow in 44 pre-operative cases and 1596 post-operative cases. We compute wave intensity in the main, right, and left pulmonary arteries for all simulations. ResultsWe compare the change in computed wave intensities pre-versus post-operatively to the results of an experimental clinical study comparing pre- and post-operative wave intensity in a 27 patient cohort. We see good agreement between the changes in the parameters of wave intensity between this study and those reported in the clinical study. Further, we capture flow distribution the changes pre-versus post-operatively which indicates that the computational model behaves as expected. ConclusionsIn this preliminary study on a computational framework to capture changes in pulmonary arterial haemodynamics following lung resection, we have shown that our model and analysis pipeline is capable of capturing post-operative changes to wave intensity and flow redistribution between the pulmonary arteries following lung resection. These results motivate further research to develop and validate a patient specific model which is an area of active research for us.

18
Differential impact of FLASH and conventional radiotherapy on a pivotal metabolic organ: White Adipose Tissue

Scabia, G.; Furini, G.; Usai, A.; Asero, G.; Guerra, E.; Mota da Silva, E.; Kusmic, C.; Cavalieri, A.; Del Sarto, D.; Costa, M.; Wabitsch, M.; Rossi, F.; Di Pietro, R.; Lattanzio, S.; Luca, T.; Pezzino, S.; Castorina, S.; Cusano, R.; Capaccioli, S.; Gonnelli, A.; Paiar, F.; Di Martino, F.; Cinti, S.; Maffei, M.

2026-04-01 physiology 10.64898/2026.03.30.715260 medRxiv
Top 0.4%
1.6%
Show abstract

BACKGROUNDSubcutaneous white adipose tissue (scWAT), a key metabolic and endocrine organ, is inevitably exposed during radiotherapy (RT). While RT is a cornerstone of cancer treatment, its efficacy is limited by toxicity to surrounding healthy tissues. Ultra-high dose rate (FLASH) RT has emerged as a promising modality capable of preserving tumor control while reducing normal tissue damage - the so-called FLASH effect. Clinical evidence indicates that childhood exposure to conventional (CONV) RT is associated with long-term dysmetabolism and WAT dysfunction. However, the impact of FLASH-RT on WAT has not been investigated. AIMTo compare the effects of FLASH- and CONV-RT on adipocyte function and scWAT homeostasis, and to identify molecular and structural changes associated with each modality. METHODSWe evaluated the effects of FLASH- and CONV-RT on adipocytes and scWAT using a dedicated linear accelerator capable of delivering both modalities. Experiments were performed in the human SGBS preadipocyte/adipocyte cell line and in a mouse model subjected to proximal hind limb irradiation, with analyses conducted 70 days post-exposure. RESULTSRT impaired adipogenic differentiation in a dose-dependent manner, with a relative sparing effect of FLASH at 4-8 Gy. Mature adipocytes exhibited radioresistance, with protection by FLASH at 8 Gy. In vivo, both regimens reduced fat mass without affecting body weight, with greater loss following CONV-RT. Transcriptomic profiling of scWAT revealed inflammatory and neurodegenerative signatures after CONV-RT, whereas FLASH-RT induced minimal transcriptional changes. Histological and ultrastructural analyses confirmed increased cellular damage, vacuolization, lipid spill-over, and reduced PLIN1 expression, predominantly in CONV-treated mice. CONCLUSIONSWAT homeostasis is sensitive to conventional RT, whereas FLASH-RT better preserves tissue structure and function, with implications for long-term metabolic health in cancer survivors.

19
HybridNet-XR: Efficient Teacher-Free Self-Supervised Learning for Autonomous Medical Diagnostic Systems in Resource-Constrained Environments.

Mayala, S.; Mzurikwao, D.; Suluba, E.

2026-03-19 health informatics 10.64898/2026.03.16.26348570 medRxiv
Top 0.4%
1.3%
Show abstract

Deep learning model classification on large datasets is often limited in countries with restricted computational resources. While transfer learning can offset these limitations, standard architectures often maintain a high memory footprint. This study introduces HybridNet-XR, a memory-efficient and computationally lightweight hybrid convolutional neural network (CNN) designed to bridge the domain gap in medical radiography using autonomous self-supervised learning protocols. The HybridNet-XR architecture integrates depthwise separable convolutions for parameter reduction, residual connections for gradient stability, and aggressive early downsampling to minimize the video RAM (VRAM) footprint. We evaluated several training paradigms, including teacher-free self-supervised learning (SSL-SimCLR), teacher-led knowledge distillation (KD), and domain-gap (DG) adaptation. Each variant was pre-trained on ImageNet-1k subsets and fine-tuned on the ChestX6 multi-class dataset. Model interpretability was validated through gradient-weighted class activation mapping (Grad-CAM). The performance frontier analysis identified the HybridNet-XR-150-PW (Pre-warmed) as the optimal configuration, achieving a 93.38% average accuracy and 99% AUC while utilizing only 814.80 MB of VRAM. Regarding class-wise accuracy, this variant significantly outperformed standard MobileNetV2 and teacher-led models in critical diagnostic categories, notably Covid-19 (97.98%) and Emphysema (96.80%). Grad-CAM visualizations confirmed that the teacher-free pre-warming phase allows the model to develop sharper, anatomically grounded focus on pathological landmarks compared to distilled models. Specialized pre-warming schedules offer a viable, computationally autonomous alternative to knowledge distillation for medical imaging. By eliminating the requirement for high-performance teacher models, HybridNet-XR provides a robust and trustworthy diagnostic foundation suitable for clinical deployment in resource-constrained environments. Author summaryTraditional deep learning models for medical imaging are often too large for the low-power computers available in many global health settings. We developed a new model to bridge this computational gap. We designed HybridNet-XR, a highly efficient AI architecture, and trained it using a "teacher-free" method that doesnt require a massive supercomputer. We found a specific version (H-XR150-PW) that provides high accuracy while using very little memory. Our results show that high-performance diagnostic AI can be deployed on standard, low-cost hardware. Furthermore, using visual heatmaps (Grad-CAM), we proved that the AI correctly identifies medical landmarks like lung opacities, ensuring it is safe and reliable for real-world clinical use.

20
Bioimpedance-assisted characterization of cardiac electroporation and anisotropic homogenization by pulsed field ablation

Jacobs, E. J.; Santos, P. P.; Parizi, S. S.; Dunham, S. N.; Davalos, R. V.

2026-03-20 bioengineering 10.64898/2026.03.18.712769 medRxiv
Top 0.4%
1.3%
Show abstract

ObjectivePulsed field ablation (PFA) relies on irreversible electroporation to create nonthermal cardiac lesions, yet real-time indicators of electroporation progression and validated lethal electric field thresholds remain limited. This study aimed to develop a bioimpedance-based metric for real-time monitoring of cardiac electroporation, evaluate the impact of myocardial anisotropy under electroporation conditions, and derive waveform-specific lethal electric field thresholds. IntroductionCurrent PFA procedures lack direct intraoperative feedback on lesion formation, and uncertainty remains regarding the role of myocardial fiber orientation in shaping electric field distributions. Because electroporation dynamically alters tissue electrical properties, monitoring these changes during treatment may improve prediction of ablation outcomes. MethodsPFA was delivered to fresh ex vivo porcine ventricular tissue using clinically relevant and energy-matched waveforms with pulse widths from 1 to 100 {micro}s. Inter-burst broadband electrical impedance spectroscopy was performed using a low-voltage diagnostic waveform to quantify burst-resolved impedance changes. Lesions were visualized using metabolic staining, then finite element models incorporating nonlinear electroporation-dependent conductivity were used to compare anisotropic and homogenized electric field distributions. Lethal electric field thresholds were estimated by fitting simulated contours to measured lesion areas and validated using uniform electric fields generated by a parallel electrode array. ResultsAcross all waveforms, impedance measurements showed a rapid initial decrease followed by stabilization, indicating early electroporation saturation. Burst-to-burst percent change in impedance slope provided a consistent, waveform-agnostic metric of electroporation progression. Lesion morphology was not systematically influenced by fiber orientation, and modeling demonstrated that electroporation-induced conductivity increases homogenized tissue anisotropy. Lethal electric field thresholds increased with decreasing pulse width, ranging from 517 {+/-} 46 V/cm (100 {micro}s) to 1405 {+/-} 55 V/cm (1 {micro}s), and were validated under uniform field conditions. ConclusionBioimpedance-assisted monitoring enables real-time assessment of cardiac electroporation, while electroporation-induced homogenization supports simplified modeling and standardized PFA treatment design.